2023-11-14
Regression Assumptions (Chapter 30 in Course Notes)
On Contingency Tables (Chapter 28 in Course Notes)
see Course Notes, Section 30
plot(model, which = c(1:3,5))
Call:
lm(formula = y ~ x1 + x2 + x3, data = sim0)
Residuals:
Min 1Q Median 3Q Max
-3.14553 -0.68079 0.08096 0.69216 2.65265
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.122852 0.348584 0.352 0.725
x1 0.285539 0.014211 20.093 <2e-16 ***
x2 -0.204908 0.005828 -35.159 <2e-16 ***
x3 0.413308 0.007172 57.631 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.007 on 196 degrees of freedom
Multiple R-squared: 0.9589, Adjusted R-squared: 0.9583
F-statistic: 1524 on 3 and 196 DF, p-value: < 2.2e-16
shown on next page (note: #| fig-height: 7 is helpful)
Importance of Assumptions (1-3):
Importance of Assumptions (4-6):
“No collinearity” is not a regression assumption, but if we see substantial collinearity, we are inclined to consider dropping some of the variables, or combining them (height and weight may be highly correlated, height and BMI may be less so).
The variance inflation factor (or VIF), if it exceeds 5, is a clear indication of collinearity. We’d like to see the variances inflated only slightly (that is, VIF not much larger than 1) by correlation between the predictors, to facilitate interpretation.
The best way to tell if you’ve improved the situation by fitting an alternative model is to actually compare and fit the two models, looking in particular at:
Options include:
Options include:
Is one of the regression assumptions violated?
This isn’t easy. We’ll do three, and then regroup.
For those of you playing along at home…
Let’s try three more…
Develop an effective model. (?) (!)
This table displays the count of patients who show complete, partial, or no response after treatment with either active medication or a placebo in a study of 100 patients…
| Group | None | Partial | Complete |
|---|---|---|---|
| Active | 8 | 24 | 20 |
| Placebo | 12 | 26 | 10 |
Is there a statistically detectable association here, at \(\alpha = 0.10\)?
The Pearson \(\chi^2\) test assumes the null hypothesis is true (rows and columns are independent.) That is a model for our data. How does it work?
Here’s the table, with marginal totals added.
| – | None | Partial | Complete | TOTAL |
|---|---|---|---|---|
| Active | 8 | 24 | 20 | 52 |
| Placebo | 12 | 26 | 10 | 48 |
| TOTAL | 20 | 50 | 30 | 100 |
The test needs to estimate the expected frequency in each of the six cells under the assumption of independence. If the rows and columns were independent, what is the expected count in the Active/None cell?
| – | None | Partial | Complete | TOTAL |
|---|---|---|---|---|
| Active | – | – | – | 52 |
| Placebo | – | – | – | 48 |
| TOTAL | 20 | 50 | 30 | 100 |
If the rows and columns were independent, then:
So, can we fill in the expected frequencies under our independence model?
| – | None | Partial | Complete | TOTAL |
|---|---|---|---|---|
| Active | 8 (10.4) | 24 (26.0) | 20 (15.6) | 52 |
| Placebo | 12 (9.6) | 26 (24.0) | 10 (14.4) | 48 |
| TOTAL | 20 | 50 | 30 | 100 |
\[ \mbox{Expected Frequency} = \frac{\mbox{Row total} \times \mbox{Column total}}{\mbox{Grand Total}} \]
This assumes that the independence model holds: the probability of being in a particular column is exactly the same in each row, and vice versa.
| – | None | Partial | Complete | TOTAL |
|---|---|---|---|---|
| Active | 8 (10.4) | 24 (26.0) | 20 (15.6) | 52 |
| Placebo | 12 (9.6) | 26 (24.0) | 10 (14.4) | 48 |
| TOTAL | 20 | 50 | 30 | 100 |
We’ll put the table into a matrix in R. Here’s one approach…
What is the conclusion?
Yes, but … if the Pearson assumptions don’t hold, then the Fisher’s test is not generally an improvement.
Fisher's Exact Test for Count Data
data: T1
p-value = 0.1358
alternative hypothesis: two.sided
dm1000 (see Classes 8-9)dm1000 <- read_rds("c21/data/dm_1000.Rds") |>
select(subject, tobacco, insurance) |>
drop_na()
head(dm1000)# A tibble: 6 × 3
subject tobacco insurance
<chr> <fct> <fct>
1 M-0001 Current Medicaid
2 M-0002 Never Commercial
3 M-0003 Former Medicare
4 M-0004 Never Medicaid
5 M-0005 Never Medicare
6 M-0006 Current Medicaid
dm1000 <- dm1000 |>
mutate(tobacco =
fct_relevel(tobacco, "Current", "Former"),
insurance =
fct_relevel(insurance, "Medicare",
"Commercial", "Medicaid"))
dm1000 |> tabyl(tobacco, insurance) |>
adorn_totals(where = c("row", "col")) tobacco Medicare Commercial Medicaid Uninsured Total
Current 99 44 118 13 274
Former 183 70 103 11 367
Never 140 80 105 18 343
Total 422 194 326 42 984
dm1000 dataPearson \(\chi^2\) results?
Pearson's Chi-squared test
data: tabyl(dm1000, insurance, tobacco)
X-squared = 25.592, df = 6, p-value = 0.0002651
Can we check our expected frequencies?
insurance Current Former Never
Medicare 99 183 140
Commercial 44 70 80
Medicaid 118 103 105
Uninsured 13 11 18
insurance Current Former Never
Medicare 117.50813 157.39228 147.09959
Commercial 54.02033 72.35569 67.62398
Medicaid 90.77642 121.58740 113.63618
Uninsured 11.69512 15.66463 14.64024
Any problems with Cochran conditions?
Each rectangle’s area is proportional to the number of cases in that cell.
vcd package (highlighting)vcd package (with \(\chi^2\) shading)─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16 ucrt)
os Windows 11 x64 (build 22621)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_United States.utf8
ctype English_United States.utf8
tz America/New_York
date 2023-11-14
pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1)
digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1)
dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1)
evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1)
fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.1)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1)
gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1)
htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1)
janitor * 2.2.0 2023-02-02 [1] CRAN (R 4.3.1)
jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1)
knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1)
lattice 0.21-8 2023-04-05 [2] CRAN (R 4.3.1)
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1)
lmtest 0.9-40 2022-03-21 [1] CRAN (R 4.3.1)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1)
MASS 7.3-60 2023-05-04 [2] CRAN (R 4.3.1)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.1)
patchwork * 1.1.3 2023-08-14 [1] CRAN (R 4.3.1)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1)
readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.1)
rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1)
rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)
scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.1)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)
snakecase 0.11.1 2023-08-27 [1] CRAN (R 4.3.1)
stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.1)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.1)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.1)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.1)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.1)
timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.1)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1)
utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.1)
vcd * 1.4-11 2023-02-01 [1] CRAN (R 4.3.1)
vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1)
withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1)
xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1)
yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
zoo 1.8-12 2023-04-13 [1] CRAN (R 4.3.1)
[1] C:/Users/thoma/AppData/Local/R/win-library/4.3
[2] C:/Program Files/R/R-4.3.1/library
──────────────────────────────────────────────────────────────────────────────
431 Class 21 | 2023-11-14 | https://thomaselove.github.io/431-2023/